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Localization of Electronic States in Chain 
Models Based on Real DNA Sequence 

Hiroaki Yamada ^ 

Aoyama 5-7-14-205, Niigata 950-2002, Japan 
Abstract 



We investigate the localization property of an electron in the disordered two- and 
three-chain systems (ladder model) with long-range correlation as a simple model 
for electronic property in a double strand of DNA. The chains are constructed by 
repetition of the sugar-phosphate sites, and the inter-chain hopping at the sugar 

'^ ' sites come from nucleotide pairs, i.e., ^ — T or G — C pairs. It has been found that 

some DNA sequences have long-range correlation. In this paper we investigate the 
O \ localization properties of the electronic states in some actual DNA sequences such as 

bacteriophages of escherichia coli, human chromosome 22 and histone protein. We 

CN . will present some numerical results for the Lyapunov exponent (inverse localization 

length) of the wave function in the cases in comparison to the results for artificial 

t;;j- ' sequence generated by an asymmetric modified Bernoulli map. It is shown that the 

^D . correlation and asymmetry of the sequence affect on the localization in both the 

^s^ \ artificial and the real DNA sequences. 

Key words: DNA, Sequence, Electronic states, Correlation, Localization, 
C^ \ Delocalization, Lyapunov exponent 
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1 Introduction 



The recent development of the nanoscale fabrication let us expect the utiliza- 
tion of the DNA wire as a molecular device [1,2] and the realization of DNA 
computing [3]. Actually, the modern development enables us to measure the 
direct DNA transport phenomena [4,5]. Recently, Forth et al. measured the 
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nonequilibrium current-voltage (/ — V) characteristics in the poly(G)-poly(C) 
DNA molecule attached to platinum lead at room temperature [4] . Cuniberti et 
al. explained the semiconducting behavior by considering the base-pair stack 
coupled to the sugar-phosphate (SP) backbone pair [6]. Iguchi also derived 
the semiconductivity and the band gap by using the ladder chain model of the 
double strand of DNA [7]. In above models, the existence of the SP backbone 
chains play an important role in the band structure due to the gap opening 
by the hybridization of the energy levels. Furthermore, recent ab-initio calcu- 
lations in short segments show that the backbone chain of DNA might play 
an imprtant role for the entire electronic spectrum of the system [8,9]. 

On the other hand, Tran et al. measured the conductivity along the lambda 
phage DNA (A— DNA) double helix at microwave frequencies using the lyophilized 
DNA in and also without a buffer [5] . The conductivity is strongly temperature 
dependent around room temperature with a crossover to a weakly temperature 
dependent conductivity at low temperatures. Yu and Song showed that the 
observed temperature dependent conductivity in the DNA can be consistently 
modeled, without invoking the additional ionic conduction mechanisms, by 
considering that electrons may use the variable range hopping for conduction 
and that electron localization is enhanced by strong thermal structural fluctu- 
ations in DNA [10]. Then the DNA double helix is viewed as a one-dimensional 
Anderson system. Carpena et al. and Roche used some real DNA sequence as 
the on-site energy in the tight-binding one-dimensional system to investigate 
the localizaton property of the wavefunctions [11,12]. 

The transport property though DNA are still controversial mainly due to the 
tremendous difficulties in the setting up the proper experimental environment 
and the DNA molecule itself. Although many theoretical explanations for the 
charg transport phenomena have been suggested on the basis of the standard 
solid-state-physics approach such as polarons, solitons, hole hopping model on 
guanine sites [13,14,15,16,17,18,19,20,21,22,23,24,25,1], the situation has been 
still far from unifying the theoretical scheme. 

Moreover, as one of the realistic situation, it has been found that the base (nu- 
cleotide) sequence of the various genes has long-range correlation characterized 
by the power spectrum S{f) ~ /^" (0.1 < a < 0.8) in the low frequency limit 
(/ << 1) [26,27,28,29,30,31,32]. As observed in the power spectrum, the mu- 
tual information analysis and the Zipf analysis of the DNA base sequence such 
as the human chromosome 22 (HCh-22) , the long-range structural correlation 
exists in the total sequence as well as the short-range periodicity [29,30,31,32]. 
Eukaryote's DNA sequence has apparently periodic repetition in terms of the 
gene duplication. The correlation length in the base sequence of genes changes 
from the early eukayote to the late eukaryote by the evolutionary process. It 
seems that the long-range correlations tends to manifest in power spectra of 
the total sequences rather that in those of the exon part and the intron part 



separately [31]. 

The localization property of the single-chain disordered system with long- 
range correlation has been extensively studied [33,34,35]. Accordingly, to com- 
pare the localization nature of the electronic states in the real DNA sequence 
with that in the disordered sequence with long-range correlation is very in- 
teresting. In the present paper, we numerically give localization nature of 
the electronic states in some real DNA sequences such as bacteriopages of 
escherichia coli (E. coli), HCh-22 and histon HI. We also investigate the cor- 
relation effect on the localization property of the one-electronic states in the 
disordered chain models with a long-range structural correlation. We present 
some numerical results for the Lyapunov exponents of the wave function. In 
particular, it is found that the correlation of the sequence enhances the lo- 
calization length and asymmetry of the distribution of the elements in the 
sequence affect on the localization. 

Note that the real values for the biological molecule such as the ionization 
energy [36], the electronegativity and so on, are not used in the numerical 
calculation in the present paper. We used the simpler values as the param- 
eters in order to show some basic localization and delocalization properties 
of the electronic states in the ladder models. We would like to mainly focus 
on (l)suggesting the model and (2)giving the preliminary numerical results of 
the electronic localization in the model with real DNA sequences. 

Outline of the present paper is as follows. In the next section we introduce 
the simple model for DNA in order to investigate the electronic states. In the 
Sec. 3 we give the mapping rules of the real DNA sequences and an asym- 
metric modified Bernoulli map in order to generate artificially the correlated 
sequences. The numerical results for the Lyapunov exponent and the local- 
ization length in the systems are given in Sect. 4. The last section contains 
summary and discussion. 



2 Model 



We simplify and model the double strand of DNA by some assumptions. DNA 
double helix structure is constructed by the coupled two single strand of DNA. 
First, we ignore the twist of DNA as well as the complicated topology. In addi- 
tion to the simplification, we consider only the tt— electrons in the backbones 
and the base-pairs of the system. We also ignore the interaction between the 
electrons and restrict ourselves to the zero-temperature property. 

Following the basic assumptions, consider the one-electron system described 
by the tightly binding model consisting of the two- or three- chains. The 



ladder model was first introduced by Iguclii [7] as a model for considering the 
electronic properties of a double strand of DNA. The Schrodinger equation is 
given as, 



^n+l,n<Pn+l ~^ ^n,n-l<Pn~l ~^ ^n,n4'n ~^ ^n'Prt 



E<Pt 



kB 



B 



Bn+l,n<Pn+l + Bn,n-l4'n-l + Bn,n4>n + Ki0„ = E(f)^ , 

where A^+i „ (-B„+i,n) means the hopping integral between the nth and (n + 
l)th sites and A„^„ (-Bn,n) the on-site energy at site n in chain A (B), and Vn 
is the hopping integral from chain A{B) to chain B{A) at site n, respectively. 
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Fig. 1. Models of the double strand of DNA. (a) The two-chain model, (b) the 
three-chain model, where S and P represent sugar and phosphate sites, respectively. 

Furthermore it can be rewritten in the matrix form, 
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We would like to investigate the asymptotic behavior (A^ -^ 00) of the products 
of the matrices Md{n) = Il'^Td=2{k) [37]. According to the parameter sets 



given by Iguchi [7], we set An+i,n = B., 



n+l,n 



b) at odd (even) site n. 



respectively and Vn = at even sites (phosphate sites) for simphcity. The 
chain A and B are constructed by the repetition of the sugar-phosphate sites, 
and the inter-chain hopping Vn at the sugar sites come from the nucleotide 
base-pairs, i.e., A — T or G — C pairs. (See Fig.l(a).) 

The reduction to single-chain system {d = 1) and the extension to three-chain 
system {d = 3) are easy [38]. In particular, when we allow hop of the electron 
between the nearest neighbor nucleotide base-pairs by the overlap integral as 
well as hop between the backbone sites, the two-chain model can be easily 
extended to the three-chain one. The three-chaneels syatem can be described 
by following Schrodinger equation. 

An+l,n(pn+l + An,n-l(ptl + Kn<Pt + K0^ = Ec^t 
Cn+2,n€+2 + C„,„-20^-2 + Cn,n€ + Vn^Pi + f/„0^ = E<f^, 
>n+l,n'Pn+l 



Bn+l,n<Pn+l + -Sn,n-10n-l + Bn,n4>n + Un<Pn = -^0n 5 



for the odd site n. On the other hands, 



-Bn+l,n0n+l + -Bn,n-10„_l + Bn,n4'n = -^0n 5 

for the even sites n. The geometry and setting in the three-chain model is 
given in Fig. 1(b). 



3 Correlated sequences 



As the correlated binary sequence {Vn} of the hopping integrals, we use some 
real DNA sequence such as the bacteriophages of E.coli (phage- A, phage-186), 
HCh-22 and histone HI. We can get the real DNA sequence from the gene 
data bases [39]. 

When we convert the nucleotide sequences {Sn} to a numerical data {Vn}, 
some rules are used as seen in DNA walk analyses [27]: (I)Prine-pyrimidin 
rule. If Sn is a purine (A or G) then Vn = Wag, if Sn is a pyrimidin (C or T) 
then Vn = WcT, where Wag and Wct denote moderate numerical values for 
calculation. (II) Hydrogen bond energy rule. Vn = Wgc for strongly bonded 
pairs (G-C), Vn = Wat for weakly bonded pairs (A-T), where Wgc and Wat 
denote moderate numerical values for calculation. (III)Hybrid rule. Vn = Wag 
for A or C, Vn = Wgt for G or T, where Wag and Wgt denote moderate 
numerical values for calculation. Apparently the hydrogen bond energy rule is 
relevant in order to investigate the electronic localization in the sequence of 



DNA double helix. We apply the hydrogen bond energy rule to the real DNA 
sequences in the present paper. 

Moreover, we compare the characteristics of the localization property with the 
result in an artificial sequence generated by following asymmetric modified 
Bernoulli map [40]. 



^X„- 2^^-1(1 -X„)^^ (X„G/i), 

where Jo = [0, l/2),/i = [1/2,1). Bq and Bi are the bifurcation parameters 
which control the correlation of the sequence, and we set 1 < -Bq < -Bi < 2 
for simplicity. The asymmetry of the map {Bq ^ Bi) corresponds to the 
asymmetric property of the distribution of the real sequence of the double 
helix DNA that the number of the A-T pairs does not equal one of G-C pairs 
which is different from the random binary sequence with equal weight. We 
introduce an indicator Rqc for the rate of the G-C pair in the sequences as, 
R(.c = [Ng + Nc)/{Ng + Nc + Na + Nt), where Nq, Nc, Na and Nt denote 
the number of each symbol G,C,A and T in the sequence, respectively. 

In the ladder model we also use the symbolized sequences {Ki} and/or {Un} 
by the following rule as the interchain hopping integral at odd sites n: 



K. ^ ^ " -- ^■'" ' '"^ (2) 

Wgc (X„ e h). 

In the numerical calculation, Wqc is set at a half of Wat for simplicity. 
{Wgc = W^at/2.) Then the artificial binary sequence can be roughly re- 
garded as the base-pair sequence as observed in the A— DNA or the HCh-22. 
The correlation function Co{n)(=< Vn^Vno+n >) (^o = 1, n is even ) de- 

cays by the inverse power-law depending on the value B as Co{n) ~ n "^i"^ 

for large n (3/2 < Bi < 2). The power spectrum becomes S{f) ~ / ^i^^ 
for small /. We focus on the Gaussian and non-Gaussian stationary region 
{1 < Bi < 2) that correspond to some real DNA base-pair sequence with 
S{f) ~ /~"(0.2 < a < 1). There are various ways to generate the correlated 
sequences as seen in study in one- dimensional disordered system with long- 
range correlation. We must pay attention to the statistical properties. For ex- 
ample, when we use the correlated random walk with Hurst index to generate 
the correlated sequence, the sequence must be rescaled by the variance of the 
fiuctuation because the fiuctuation diverges with the length of the sequence. 
However, in the stationary sequence generated by the modified Bernoulli map 



the fluctuation does not diverge because it takes only alternative values at the 
each sites n. 



4 Numerical Result 



We give the numerical result of the energy dependence of the Lyapunov ex- 
ponents. The definition is given by, 



7i = Jim — \oga,{Md{n)^Md{n)), (3) 

where aj(...) denotes the ith eigenvalue of the matrix M(i{n)'^ Mfi{n) [41]. As the 
transfer matrix Td{n) is symplectic, the eigenvalues of the Md{n)'^Md{n) have 
the reciprocal symmetry around the unity as e"''^ , ...,e^'^,e~"''^, ...,e~^^ , where 
7i > 72 > •••7d > 0. 

Furthermore, it is found that for the thermodynamic limit (n — >■ cxd) the 
largest channel-dependent localization length ^d{= '^lld) determines the ex- 
ponential decay of the Landauer conductance of the system between metallic 
electrodes as g{n) = J2i{cosh{2'yin) — l)^"*^ -^ exp(— 27^71) in units of 2e^//i 
at zero temperature and serves as the localization length of the total system 
of the coupled chains [37]. Recently, the electron transport for the molecular 
wire between two metallic electrorodes has been also investigated by several 
techniques [42]. 

We consider the correlation effect on the localization property of the disordered 
case by using some real DNA sequences and the modified Bernoulli model. 
Then we used a sample with the system size A^ = 10^ for the numerical 
calculation in the modified Bernoulli map. Note that perfect periodicity exists 
in the deterministic (even) sites in our models. (See Fig.l.) We introduce 
another long-range correlation due to the base-pair sequence on the odd sites 

V2n-1- 

It is generally known that the correlation of the sequence enhances the de- 
localization in the electronic states. In the asymmetric modified Bernoulli 
system characterized by the two-parameters, Bq,Bi, the correlation decay 
depends on Bi in large sequences, because we set Bq < Bi. Figures 2(a) 
shows the energy dependence of the Lyapunov exponents (71 and 72) for 
some cases in asymmetric modified Bernoulli system. They are named as 
follows: case (l)-Bo = 1.0, -Bi = 1.0, case {2)Bq = 1.0, -Bi = 1.9 and case 
(3)-Bo = 1-7, -Bi = 1.9. Apparently the case (1) is more localized than cases 
(2) and (3) in vicinity of the band center \E\ < 1. The comparison between 
the case (2) and case (3) shows the effect of the asymmetry of the map on 





1 1 1 


1 1 1 


1 


1.0- 


(a)| \ 


if 


il 


0.8- 


Case(1) 

Case(2) 

---- Case(3) 


0.6- 


II 




0.4- 


1 1 




0.2- 


\\ 


/•■ 


0.0- 


'\. 


.^-' 



1 — I — I — I — I — I — r 

-3-2-10123 





1 1 


1 1 1 


1 1 


1.0- 


(c)| 


\ / 






Lambda 

186 




0.8- 


\ / 


1 


0.6- 






0.4- 






0.2- 




J 


0.0- 




vL^ 



1.0- 


1 1 1 1 

(b)\ \ 


1 1 

/ / 


L 




HC22 1 

HC22 2 




0.8- 


/ / 




0.6- 


\ \ 1 


( / 


0.4- 


\ \ 


/ 


0.2- 


\ 


J 


0.0- 


vL 


/ 



"1 — I — I — I — I — I — r 

-3-2-10123 



0.4 -r 



0.3- 



0.2- 



0.1 



0.0- 



"I — I — I — I — I — I — r 

-3-2-10123 




Fig. 2. Lyapunov exponents (71, 72) as a function of energy in the ladder model, (a) 
modeified Bernoulli model, (b) human chromosome 22, (c) bacteriophages of E.coli 
(phage-A, phage-186), (d)early histone HI and late histone HI. The on-site energy 
is set at Ann = Bnn = 0, a = —1.0,6 = —0.5. The size of the sequence is A^ = 10^ 
for (a), iV = 10^ for (b), A^ = 48510 for the phage-A in (c), iV = 30624 for the 
phage-186 in (c), N = 787 for the early histone HI in (d), and N = 1182 for the 
late histone HI in (d). 

the localization. Rqc ~ 0.2 for the case(2), Rqc ~ 0.47 for the case(3). In 
energy regime \E\ > 1, the Lyapunov exponent 72 in the case (2) is smaller 
than one in the case (3) in spite of the same correlation strength Bi. As a 
result, it is found that in the DNA ladder model the correlation and asymme- 
try enhance the localization length ^(= 72^^) of the electronic states around 
\E\ < 1, although the largest Lyapunov exponent 71 does not almost change 
by the effects. 

Figure 2(b), 2(c) and 2(d) show the nonnegative Lyapunov exponents in real 
DNA sequences of (b)HCh-22, (c) bacteriophages of E. coli and (d)Histon pro- 



teins. In the case of HCh-22, we used two sequences with A^ = 10^, extracted 
from the original large DNA sequence. The result shows the Lyapunov expo- 
nents do not depend the details of the difference of the sequence in HCh-22. 
Although the weak long-range correlation has been observed in HCh-22 as 
mentioned in introduction, it does not affect on the localization property. The 
sequences we used are almost symmetric {Rgc ~ 0.5). 
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Fig. 3. Localization length ^(= 72) as a function of energy in the ladder model, (a) 
bacteriophages (phage-A, phage-186), (b)early histone HI and late histone HI. The 
parameters are same to those in Fig. 2. 

In Fig. 2(c) and (d) the least nonnegative Lyapunov exponents are influenced 
by the difference in the sequence. The localization length ^d = l/7d defined 
by the least nonnegative Lyapunov exponent for the bacteriophages and his- 
ton HI are shown in Fig. 3 (a) and (b). Apparently the localization length of 
the phage-A is larger than phage-186. Moreover, it seems that the difference 
between sequence of early histon HI and late histon HI effects on the reso- 
nance structure around E = |2|, although the difference does not change the 
localization property vicinity of the band center. 



Furthermore, we have confirmed that almost similar property to the double- 
chain model have been observed in the three-chain model. Figure 4 shows the 
energy dependence of the Lyapunov exponents (71,72,73) in the three-chain 
model. In Fig. 4 (a) the result for asymmetric modified Bernoulli system is 
shown. Appearently, the correlation and/or asymmetry of the sequence effect 
a change in the second and third Lyapunov exponent. In contrast, although 
the global feature of 71 is almost unchanged, the local structure of the energy 
dependence is changed by the change of Bq. Figure 4(b) shows the results in 
phage-fd and phage-186 in the three-chain model. It is found that the structure 
of the energy dependence around \E\ < 2 is different from that in the artificial 
sequence by the modified Bernoulli map. 
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Fig. 4. Lyapunov exponents (71,72,73) as a function of energy in the three-chain 
model, (a) modeified Bernoulh model, (b)bacteriophages of E.coli (phage-fd, 
phage-186). The parameters are same as ones in Fig. 2 except for on-site energies of 
C-chain (C„_„ = 0). 



5 Summary and Discussion 



We introduced two-chain and three-chain models as simple models for elec- 
tronic property in the double strand of DNA. We numerically investigated the 
correlation effect on the localization property of the one-electronic states in 
the disordered two-chain (ladder) and three-chain models with the long-range 
structural correlation by means of asymmetric modified Bernoulli model and 
some real DNA sequences. As a result, the correlation enhances the localiza- 
tion length (7^^) around \E\ < 1, although the 71 does not almost change. In 
addition to the correlation effect, the asymmetry of the sequence also enhances 
the localization length. Almost similar property to the two-chain model have 
been observed in the three-chain model. 
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The relation between the correlation length of the DNA sequence and the 
evolutionary process is suggested [31,43]. Moreover, it is interesting if the 
localization property would be related to the evolutionary process. Up to now, 
although we consider the effect of the two points correlation of the sequence on 
the localization, the relation between the general complexity of the sequence 
and the localization property is also very interesting future problem [44,45]. 

Finally, it should be noted that although in the present paper we have focused 
on the localization properties of the electronic states at zero temparature, 
with enhasis on the ladder geometry by the backbone and correlation of the 
DNA sequence, for the sake of simplicity, in the experiment of the conduc- 
tance property of the DNA, both the temperature effect and the temperature 
dependence become important. Indeed, the finite temperature can also reduce 
the effective system size and leads to the changes in the transport property. 

The author would like to thank Dr. Kazumoto Iguchi for stimulating and 
useful discussions. 
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